Conversation
| if not self.input_supported(input_type): | ||
| raise ValueError("Input type not supported") | ||
|
|
||
| # If the input is already an audio path, pass it through unchanged. | ||
| if input_type == "audio_path": | ||
| return ConverterResult(output_text=prompt, output_type="audio_path") |
There was a problem hiding this comment.
You must be thinking of a use case I am unable to anticipate 🙂 Can you elaborate?
There was a problem hiding this comment.
Sometimes you want to generate attacks that include previous turns and then add new turns on top. The problem is if those previous turns were audio, and the new turn you want to add on top is based on a text prompt. Then you have a mix of audio & text together and when you have the convertor attached to the target all prompts go through the convertor leading to it throwing an error when it tries to convert the audio_file of the previous turns. You could account for this in the notebook at run time but its easier and cleaner to have the convertor handle this and just pass through things are already audio. I couldn't really see a downside to having this in the convertor but keen to know if you can think of a problem this may cause.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class MultiLanguageTranslationConverter(PromptConverter): |
There was a problem hiding this comment.
This is actually doable with the selective text converter + translation converter, see https://azure.github.io/PyRIT/code/converters/6_selectively_converting.html#example-7-applying-converters-to-different-parts
I do see the appeal of having a shortcut, though. Wdyt?
There was a problem hiding this comment.
Yeah so I spent some time on this. My reasoning for having a separate convertor for this is:
- Without digging into the docs a bit its not easy to see how to do this, type of splitting so its not easily discoverable
- Implementing it in a notebook flow is a bit cumbersome, especially when you want to try a lot of different approaches. Having a convertor makes it cleaner and easier to implement but happy to change course if other disagree.
- Baking the capability into the RandomTranslationConverter could be doable be would add a level of complexity to the convertor that I felt having a separate one made sense from a maintainability point of view but very happy to take guidance on this.
There was a problem hiding this comment.
I'm wondering if we could do something like:
- Wrap the splitting and chaining logic in something like
SequenceLevelConverter(effectively a generalized version ofWordLevelConverter) - Merge
MultiLanguageTranslationConverterandRandomTranslationConverter, maybe inheriting this newSequenceLevelConverter, supporting both fixed/random language selection, and sequence/word splitting.
There was a problem hiding this comment.
I would like to have @rlundeen2 chime in since he created STC. I briefly considered if this should be a shortcut to do what selective text converter does for this but it feels... not easier? Maybe because I'm already familiar with it. In any case, I don't see a case for having the implementation. At most, it should be an alias for using the selective text converter under the hood.
…tors notebook to populate new convertors.
…to pebryan_audio
…to pebryan_audio
| logger.info( | ||
| "Multi-language translation complete: %d segments across languages %s", | ||
| len(translated_segments), | ||
| self.languages[: len(segments)], |
There was a problem hiding this comment.
nit:
| self.languages[: len(segments)], | |
| self.languages[: len(self.languages)], |
| language = self.languages[i] | ||
|
|
||
| system_prompt = self._prompt_template.render_template_value(languages=language) | ||
| conversation_id = str(uuid.uuid4()) |
There was a problem hiding this comment.
does it matter at all that all of these are going to be part of a different conversation?
| Raises: | ||
| ValueError: If speed_factor is not positive. | ||
| """ | ||
| if speed_factor <= 0: |
There was a problem hiding this comment.
is there an upper bound ?
| info = np.iinfo(data.dtype) | ||
| max_val = float(info.max) | ||
| else: | ||
| max_val = 1.0 |
There was a problem hiding this comment.
should info be assigned here ? what happens on line 80 if it's not ?
Description
Added new audio convertors to add the following:
Added new translation convertor to allow for mid sentence language switching in a prompt MultiLanguageTranslationConverter
Updated AzureSpeechTextToAudioConverter to handle a situation where an audio file input is handled and just passed back out. This handles situations when using the convertors with conversation history that may include mixed audio and text Messages that would otherwise throw exceptions.
Sorry I did not raise an issue for this ahead of time, experimentation of ideas turned into code and wanted to contribute. Happy to refactor whoever is deemed best.
Tests and Documentation